On the Derivation Perplexity of Treebanks
نویسندگان
چکیده
Parsing performance is typically assumed to correlate with treebank size and morphological complexity [6, 13]. This paper shows that there is a strong correlation between derivation perplexity and performance across morphologically rich and poor languages. Since perplexity is orthogonal to morphological complexity, this questions the importance of morphological complexity. We also show that derivation perplexity can be used to evaluate parsers. The main advantage of derivation perplexity as an evaluation metric is that it measures global aspects of parsers (like counting exact matches), but is still fine-grained enough to derive significant results on small standard test sets (like attachment scores).
منابع مشابه
Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited
This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Princ...
متن کاملData point selection for cross-language adaptation of dependency parsers
We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the ta...
متن کاملAn Empirical Study of Differences between Conversion Schemes and Annotation Guidelines
We establish quantitative methods for comparing and estimating the quality of dependency annotations or conversion schemes. We use generalized tree-edit distance to measure divergence between annotations and propose theoretical learnability, derivational perplexity and downstream performance for evaluation. We present systematic experiments with treeto-dependency conversions of the PennIII tree...
متن کاملWorkshop on High-level Methodologies for Grammar Engineering @ Esslli 2013 Organization Executive Committee Program Committee a Type-logical Treebank for French
In this article, we describe the way we use hierarchical clustering to learn an AB grammar from partial derivation trees. We describe AB grammars and the derivation trees we use as input for the clustering, then the way we extract information from Treebanks for the clustering. The unification algorithm, based on the information extracted from our clusters, will be explained and the results disc...
متن کاملThe Galactic Dependencies Treebanks: Getting More Data by Synthesizing New Languages
We release Galactic Dependencies 1.0—a large set of synthetic languages not found on Earth, but annotated in Universal Dependencies format. This new resource aims to provide training and development data for NLP methods that aim to adapt to unfamiliar languages. Each synthetic treebank is produced from a real treebank by stochastically permuting the dependents of nouns and/or verbs to match the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010